retrieval algorithm
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs
Xu, Jiaming, Pan, Jiayi, Wang, Hanzhen, Zhou, Yongkang, Ye, Jiancai, Wang, Yu, Dai, Guohao
In this paper, we point out that the objective of the retrieval algorithms is to align with the LLM, which is similar to the objective of knowledge distillation in LLMs. We analyze the similarity in information focus between the distilled language model(DLM) and the original LLM from the perspective of information theory, and thus propose a novel paradigm that leverages a DLM as the retrieval algorithm. Based on the insight, we present SpeContext, an algorithm and system co-design for long-context reasoning. (1) At the algorithm level, SpeContext proposes lightweight retrieval head based on the head-level attention weights of DLM, achieving > 90% parameters reduction by pruning the redundancy. (2) At the system level, SpeContext designs an asynchronous prefetch dataflow via the elastic loading strategy, effectively overlapping KV cache retrieval with the LLM computation. (3) At the compilation level, SpeContext constructs the theoretical memory model and implements an adaptive memory management system to achieve acceleration by maximizing GPU memory utilization. We deploy and evaluate SpeContext in two resourceconstrained environments, cloud and edge. Extensive experiments show that, compared with the Huggingface framework, SpeContext achieves up to 24.89x throughput improvement in cloud and 10.06x speedup in edge with negligible accuracy loss, pushing the Pareto frontier of accuracy and throughput.
SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation
Cai, Yuzheng, Guo, Zhenyue, Pei, Yiwen, Bian, Wanrui, Zheng, Weiguo
Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate its hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) method. It effectively addresses the challenge of aligning query texts and KG structures through a two-stage process: (1) query-to-pattern, which uses an LLM to transform queries into a desired graph pattern, and (2) pattern-to-subgraph, which quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. We also develop an optimized retrieval algorithm that efficiently identifies the top-$k$ subgraphs within 1-second latency on a 10-million-scale KG. Extensive experiments show that SimGRAG outperforms state-of-the-art KG-driven RAG methods in both question answering and fact verification, offering superior plug-and-play usability and scalability.
DECO: Life-Cycle Management of Enterprise-Grade Chatbots
Zhu, Yiwen, Demarne, Mathieu, Deng, Kai, Wang, Wenjing, Sahoo, Nutan, Vermareddy, Divya, Lerner, Hannah, Lu, Yunlei, Bararia, Swati, Bhavan, Anjali, Zhang, William, Li, Xia, Lin, Katherine, Cilimdzic, Miso, Krishnan, Subru
Software engineers frequently grapple with the challenge of accessing disparate documentation and telemetry data, including Troubleshooting Guides (TSGs), incident reports, code repositories, and various internal tools developed by multiple stakeholders. While on-call duties are inevitable, incident resolution becomes even more daunting due to the obscurity of legacy sources and the pressures of strict time constraints. To enhance the efficiency of on-call engineers (OCEs) and streamline their daily workflows, we introduced DECO -- a comprehensive framework for developing, deploying, and managing enterprise-grade chatbots tailored to improve productivity in engineering routines. This paper details the design and implementation of the DECO framework, emphasizing its innovative NL2SearchQuery functionality and a hierarchical planner. These features support efficient and customized retrieval-augmented-generation (RAG) algorithms that not only extract relevant information from diverse sources but also select the most pertinent toolkits in response to user queries. This enables the addressing of complex technical questions and provides seamless, automated access to internal resources. Additionally, DECO incorporates a robust mechanism for converting unstructured incident logs into user-friendly, structured guides, effectively bridging the documentation gap. Feedback from users underscores DECO's pivotal role in simplifying complex engineering tasks, accelerating incident resolution, and bolstering organizational productivity. Since its launch in September 2023, DECO has demonstrated its effectiveness through extensive engagement, with tens of thousands of interactions from hundreds of active users across multiple organizations within the company.
Recursive Abstractive Processing for Retrieval in Dynamic Datasets
Chucri, Charbel, Azouz, Rami, Ott, Joachim
Recent retrieval-augmented models enhance basic methods by building a hierarchical structure over retrieved text chunks through recursive embedding, clustering, and summarization. The most relevant information is then retrieved from both the original text and generated summaries. However, such approaches face limitations with dynamic datasets, where adding or removing documents over time complicates the updating of hierarchical representations formed through clustering. We propose a new algorithm to efficiently maintain the recursive-abstractive tree structure in dynamic datasets, without compromising performance. Additionally, we introduce a novel post-retrieval method that applies query-focused recursive abstractive processing to substantially improve context quality. Our method overcomes the limitations of other approaches by functioning as a black-box post-retrieval layer compatible with any retrieval algorithm. Both algorithms are validated through extensive experiments on real-world datasets, demonstrating their effectiveness in handling dynamic data and improving retrieval performance.
LePaRD: A Large-Scale Dataset of Judges Citing Precedents
Mahari, Robert, Stammbach, Dominik, Ash, Elliott, Pentland, Alex `Sandy'
We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various retrieval approaches on LePaRD, and find that classification appears to work best. However, we note that legal precedent prediction is a difficult task, and there remains significant room for improvement. We hope that by publishing LePaRD, we will encourage others to engage with a legal NLP task that promises to help expand access to justice by reducing the burden associated with legal research. A subset of the LePaRD dataset is freely available and the whole dataset will be released upon publication.
A Deep Learning Architecture for Passive Microwave Precipitation Retrievals using CloudSat and GPM Data
Rahimi, Reyhaneh, Vahedizadeh, Sajad, Ebtehaj, Ardeshir
This paper presents an algorithm that relies on a series of dense and deep neural networks for passive microwave retrieval of precipitation. The neural networks learn from coincidences of brightness temperatures from the Global Precipitation Measurement (GPM) Microwave Imager (GMI) with the active precipitating retrievals from the Dual-frequency Precipitation Radar (DPR) onboard GPM as well as those from the {CloudSat} Profiling Radar (CPR). The algorithm first detects the precipitation occurrence and phase and then estimates its rate, while conditioning the results to some key ancillary information including parameters related to cloud microphysical properties. The results indicate that we can reconstruct the DPR rainfall and CPR snowfall with a detection probability of more than 0.95 while the probability of a false alarm remains below 0.08 and 0.03, respectively. Conditioned to the occurrence of precipitation, the unbiased root mean squared error in estimation of rainfall (snowfall) rate using DPR (CPR) data is less than 0.8 (0.1) mm/hr over oceans and land. Beyond methodological developments, comparing the results with ERA5 reanalysis and official GPM products demonstrates that the uncertainty in global satellite snowfall retrievals continues to be large while there is a good agreement among rainfall products. Moreover, the results indicate that CPR active snowfall data can improve passive microwave estimates of global snowfall while the current CPR rainfall retrievals should only be used for detection and not estimation of rates.
Guided Unsupervised Learning by Subaperture Decomposition for Ocean SAR Image Retrieval
Ristea, Nicolae-Cฤtฤlin, Anghel, Andrei, Datcu, Mihai, Chapron, Bertrand
Spaceborne synthetic aperture radar (SAR) can provide accurate images of the ocean surface roughness day-or-night in nearly all weather conditions, being an unique asset for many geophysical applications. Considering the huge amount of data daily acquired by satellites, automated techniques for physical features extraction are needed. Even if supervised deep learning methods attain state-of-the-art results, they require great amount of labeled data, which are difficult and excessively expensive to acquire for ocean SAR imagery. To this end, we use the subaperture decomposition (SD) algorithm to enhance the unsupervised learning retrieval on the ocean surface, empowering ocean researchers to search into large ocean databases. We empirically prove that SD improve the retrieval precision with over 20% for an unsupervised transformer auto-encoder network. Moreover, we show that SD brings important performance boost when Doppler centroid images are used as input data, leading the way to new unsupervised physics guided retrieval algorithms.
Retrieving Black-box Optimal Images from External Databases
Suppose we have a black-box function (e.g., deep neural network) that takes an image as input and outputs a value that indicates preference. How can we retrieve optimal images with respect to this function from an external database on the Internet? Standard retrieval problems in the literature (e.g., item recommendations) assume that an algorithm has full access to the set of items. In other words, such algorithms are designed for service providers. In this paper, we consider the retrieval problem under different assumptions. Specifically, we consider how users with limited access to an image database can retrieve images using their own black-box functions. This formulation enables a flexible and finer-grained image search defined by each user. We assume the user can access the database through a search query with tight API limits. Therefore, a user needs to efficiently retrieve optimal images in terms of the number of queries. We propose an efficient retrieval algorithm Tiara for this problem. In the experiments, we confirm that our proposed method performs better than several baselines under various settings.
Unfolded Algorithms for Deep Phase Retrieval
Naimipour, Naveed, Khobahi, Shahin, Soltanalian, Mojtaba
Exploring the idea of phase retrieval has been intriguing researchers for decades, due to its appearance in a wide range of applications. The task of a phase retrieval algorithm is typically to recover a signal from linear phaseless measurements. In this paper, we approach the problem by proposing a hybrid model-based data-driven deep architecture, referred to as Unfolded Phase Retrieval (UPR), that exhibits significant potential in improving the performance of state-of-the art data-driven and model-based phase retrieval algorithms. The proposed method benefits from versatility and interpretability of well-established model-based algorithms, while simultaneously benefiting from the expressive power of deep neural networks. In particular, our proposed model-based deep architecture is applied to the conventional phase retrieval problem (via the incremental reshaped Wirtinger flow algorithm) and the sparse phase retrieval problem (via the sparse truncated amplitude flow algorithm), showing immense promise in both cases. Furthermore, we consider a joint design of the sensing matrix and the signal processing algorithm and utilize the deep unfolding technique in the process. Our numerical results illustrate the effectiveness of such hybrid model-based and data-driven frameworks and showcase the untapped potential of data-aided methodologies to enhance the existing phase retrieval algorithms.
u-net CNN based fourier ptychography
Chen, Yican, Luo, Zhi, Wu, Xia, Yang, Huidong, Huang, Bo
Fourier ptychography is a recently explored imaging method for overcoming the diffraction limit of conventional cameras with applications in microscopy and yielding high-resolution images. In order to splice together low-resolution images taken under different illumination angles of coherent light source, an iterative phase retrieval algorithm is adopted. However, the reconstruction procedure is slow and needs a good many of overlap in the Fourier domain for the continuous recorded low-resolution images and is also worse under system aberrations such as noise or random update sequence. In this paper, we propose a new retrieval algorithm that is based on convolutional neural networks. Once well trained, our model can perform high-quality reconstruction rapidly by using the graphics processing unit. The experiments demonstrate that our model achieves better reconstruction results and is more robust under system aberrations.